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A simple but efficient spectral approach for analyzing the community structure of complex net¬ 
works is introduced. It works the same way for all types of networks, by spectrally splitting the 
adjacency matrix into a “unipartite” and a “multipartite” component. These two matrices reveal 
the structure of the network from different perspectives and can be analyzed at different levels of 
detail. Their entries, or the entries of their lower-rank approximations, provide measures of the 
affinity or antagonism between the nodes that highlight the communities and the “gateway” links 
that connect them together. An algorithm is then proposed to achieve the automatic assignment 
of the nodes to communities based on the information provided by either matrix. This algorithm 
naturally generates overlapping communities but can also be tuned to eliminate the overlaps. 
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I. INTRODUCTION 


Community structure detection has been one of the 
most important research topics in network science in re¬ 
cent years. Although no exact definition exists, a commu¬ 
nity is broadly understood as a set of nodes that “work 
together to achieve a certain function of the network”. It 
is usually assumed that there is a correlation between the 
density of connections and function, namely that subsets 
of the network whose nodes are more densely connected 
than in a random “null model” are likely to perform some 
function together Alternatively, especially in the 

case of bipartite or directed networks, a frequently used 
assumption is that nodes that share many connections 
are likely to perform a common task Si- The two as¬ 
sumptions have essentially the same meaning in the case 
of very densely connected communities, but are other¬ 
wise distinct. The method presented in this paper nat¬ 
urally identifies communities defined according to either 
assumption. 

Various methods have been proposed so far to identify 
the community structure, most of them applying only 
to unipartite undirected networks [in IgI-I^. They in¬ 
clude divisiye algorithms 0, graph partitionin g lldl . hi¬ 
erarchical clusterim lP , partitional clustering [l3j|. spec- 
tral clustering |14l - [l8l |. as well as more unusual methods 
[l^ - |^ . Howeyer, the most commonly used methods are 
those based on the maximization of a goal function called 
modularity, introduced by Newman and Giryan SiQ- 
The maximization is achieyed using different heuristic 
approaches like greedy search Q, extremal optimiza¬ 
tion simulated annealing Q, or spectral bisectioning 
Si- The latter has eyolyed into more so phis ticated al¬ 
gorithms, which increase performance |22l.l25l 261 or are 
specifically designed for bipartite networks Iflp, di- 
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rected networks [^ . or networks with overlapping com- 
munities [29l - [^ . Although community detection algo¬ 
rithms that use modularity as a goal function are known 
to suffer from a resolution problem which preyents them 
from detecting communities below a certain size , 

they are so far the most frequently used in the case of 
undirected networks with non-oyerlapping communities 
because modularity is based on a clear working defini¬ 
tion of what it means for such a network to be modular 
[l|. Howeyer, in the case of bipartite or directed networks 
and especially for networks with oyerlapping communi¬ 
ties there is no uniyersally accepted definition of modu¬ 
larity El, BUM! and there is no way to directly com¬ 
pare the quality of partitions that have been obtained 
by maximizing different modularity functions. For this 
reason, it is important to have a community detection 
method that is independent of a definition of modular¬ 
ity, works the same way in all situations, and produces 
results compatible with modularity-based methods when¬ 
ever comparison is meaningful. 


The first steps in this direction were taken in Refs. [23], 
l40j| . Although Ref. does not provide a method for 
identifying the community structure, it is notable for us¬ 
ing a truncated singular value decomposition (SVD) of a 
“contribution matrix” to analyze the structure of pre¬ 
determined communities and the relationship between 
them. The algorithm of Ref. identifies the commu¬ 
nities by using a singular value decomposition of the un¬ 
signed Laplacian matrix for unipartite networks, or of 
the rectangular adjacency sub-matrix for bipartite net¬ 
works, followed by the application of a fc-means cluster¬ 
ing algorithm in the subspace spanned by the left and 
right singular vectors corresponding to the largest singu¬ 
lar values. In this latter regard, they are still very close to 
the spectral clustering algorithms of Refs. |13l - [lq . Their 
algorithm has the drawback of using different matrices 
for uni- and bipartite networks and can only identify 
“unipartite”-type communities (comprising nodes from 
both parties) on bipartite networks. In addition. Ref. 231 
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lacks a performance comparison with modularity-based 
methods in terms of ensemble averages. The community 
detection method introduced in this paper is simpler and 
works the same way for all types of networks. It starts 
by generating two matrices, in which “unipartite” and 
respectively “multipartite”-type communities (the latter 
consisting of nodes from a single party) are immediately 
visible. The entries of these matrices provide a measure 
of the affinity or antagonism between the different nodes 
which can be useful by itself (and likely sufficient for 
many purposes), but can also be used to generate either 
overlapping or non-overlapping community structures. 

Finally, with the exception of 22|, all spectral algo¬ 
rithms proposed so far to maximize modularity perform 
recursive bisections of the network and its communities 
by using only the leading eigenvalue of the modularity 
matrix. The bisections must be combined with additional 
“fine-tuning” [^,3, “final tuning” 251 and possibly ag¬ 
glomeration [2 61 steps, without which the performance 
of these algorithms would be insufficient. These addi¬ 
tional steps do not increase the complexity of the algo¬ 
rithms but require significant extra effort to program. A 
question of both theoretical and practical importance is 
whether a different type of spectral algorithm, that uses 
multiple eigenvectors of the adjacency matrix and is not 
specifically designed to maximize modularity, still needs 
such additional steps to achieve good performance. We 
present results showing that, except for extremely sparse 
or weakly modular networks, the algorithm proposed in 
this paper produces good to excellent community struc¬ 
tures without additional steps. 


II. METHOD 
A. Background 

Let A be the adjacency matrix of a sparse network 
with JV nodes. There is no restriction on whether the 
network is uni- or bipartite, unweighted or weighted. In 
the weighted case, A is understood to be the weights ma¬ 
trix. We will assume that the network is undirected, but 
directed networks can be represented as bipartite undi¬ 
rected ones for the purpose of community structure anal¬ 
ysis d]. 

The goal is to partition the network into a set of com¬ 
munities {Ck}, with k = 1,K, that makes sense in light 
of the criteria mentioned in the first paragraph of the In¬ 
troduction. Although the adjacency matrix is the most 
straightforward representation of a network, it has so far 
been considered unfit for the purpose of determining the 
community structure. The reason for this apparent in¬ 
ability and the way to deal with it are discussed in this 
section. 

Community detection algorithms have been proposed 
that use either the stochastic matrix [T^ or different 
forms of the network Laplacian [T^ , but the most 

popular algorithms start with the definition of a modu¬ 


larity function. In the case of unipartite undirected net¬ 
works, modularity is defined as 

fe=i ijeCk ^ ^ 

where di is the degree of node i and 2m = Mod¬ 

ularity is then expressed as 

g = (2) 

2m 

where M is the modularity matrix defined by 

^ (3) 

and S' is a binary N x K matrix with Sik = 1 if node i 
belongs to community k and zero otherwise. 

In the standard spectral bisectioning algorithm due to 
Newman [1,01 as well as in its variants [a 113, Uli HI]; S 
is a column matrix and the network is recursively bisec- 
tioned according to the signs of the components of the 
eigenvector corresponding to the largest eigenvalue of the 
modularity matrix and then of its modified community¬ 
wide version until the modularity function can no longer 
be increased. There are also “fine-tuning” d. Ill and “fi¬ 
nal tuning” [Hi steps that can be added at the end of each 
bisection and at the end of the bisectioning process, re¬ 
spectively, to improve the performance of the algorithm. 

Of particular interest are the variants introduced by 
Guimera [1| and Barber (Hi which are both specifically 
designed to deal with bipartite networks but detect differ¬ 
ent types of communities. The algorithm of Ref. dl finds 
communities that are subsets of only one party. Such 
communities will be called “bipartite” or “multipartite” 
in this paper. On the other hand, the algorithm described 
in Ref. [27| finds cross-party communities, which will be 
called “unipartite”. As will be seen, the algorithm pre¬ 
sented in this paper is capable of detecting both types of 
communities on bipartite and therefore also on directed 
networks. 

In d, Newman points out the possibility of using more 
than one eigenvector of the modularity matrix but this 
idea has not been pursued until recently (H • The al¬ 
gorithm proposed in Ref. [H uses orthonormal rotations 
in a space spanned by the eigenvectors corresponding to 
the K largest eigenvalues of the modularity matrix while 
[H uses a singular value decomposition of the unsigned 
network Laplacian followed by fc-means clustering in a 
similar space. 


B. General description 

On the other hand, it is obvious that the community 
structure can be regarded as a “coarse-graining” of the 
network under analysis. The intuition behind the method 
proposed in this paper is to translate the coarse-graining 
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FIG. 1. A simple nearly-bipartite network. 


algebraically into a representation of a community as a 
square sub-matrix whose entries are all positive or greater 
than a certain positive threshold, centered on the main 
diagonal of a simplified adjacency matrix. This makes 
sense if belonging to a community is viewed as being un¬ 
der the influence of a “center of power”, with all members 
interacting with each other through it. The problem of 
identifying the community structure (including the case 
of overlapping communities) then translates into finding 
all such sub-matrices that are maximal (not contained 
within larger ones). 

Sub-matrices of the kind described above are nowhere 
to be found in the adjacency matrices of typical real- 
world or model networks. Networks composed of sparsely 
interconnected cliques come closest to this picture but 
even they have all diagonal elements equal to zero unless 
self-loops are allowed. In order to obtain a coarse-grained 
version of the adjacency matrix it seems natural to per¬ 
form a singular value decomposition A = U'EV'^ [4l| and 
then retain only the terms corresponding to the largest 
K < N singular values, 

K 

= '^(ykU:kV^- (4) 

k^l 

Here U and V are orthogonal matrices whose columns are 
the left and right singular vectors of matrix A while S is 
diagonal with non-negative entries ak- This is reminis¬ 
cent of approaches used in some lossy image compression 
and face recognition algorithms as well as of the principal 
component analysis method used in statistics [2^ 1^ . A 
low-rank approximation of the adjacency matrix is ex¬ 
pected to retain only its most important features, en¬ 
hancing sets of similar rows or columns, introducing ad¬ 
ditional links within the densely connected subsets, and 
weakening the links between them (4ll |. This is exactly 



FIG. 2. (Color online) Split eigenvalue expansions of the ad¬ 
jacency matrix for the network in Fig. [T] Red (solid) and blue 
(hollow) dots represent positive and negative matrix entries, 
respectively. The dot in the legend box has unity diameter. 


what is needed in order to reveal communities defined 
either by high density of links or by similarity of connec¬ 
tion, as discussed in the first paragraph of the Introduc¬ 
tion. Moreover, it is known that retaining the first K 
singular values from an SVD leads to the best rank-AT 
approximation of the original matrix in terms of Frobe- 
nius norm [4l|. Everything seems right, and yet, if the 
method is applied as described above, it gives fair re¬ 
sults on some networks but completely fails to identify a 
meaningful community structure on many. 

A simple example is the network shown in Fig.[Tl which 
is nearly bipartite except for the link between nodes 9 and 
10. The network is shown in two different layouts, which 
emphasize the unipartite and bipartite communities re¬ 
spectively. The first term of the expansion in Eq. (4) 
does contain information about the relative importance 
of the nodes within the network, which is not surprising, 
since U-i = Vi dehnes the eigenvector centrality mea¬ 
sure. As more terms are added, though, the singular 
value expansion simply converges towards the adjacency 
matrix without ever revealing a community structure. 

To understand the root of the problem, note first that 
for real symmetric matrices the singular value decompo¬ 
sition is closely related to the eigenvalue decomposition 
A = UAU"'": the singular values are the absolute values 
of the eigenvalues, Gi = |Ai|, and any negative eigenvalue 
signs are transferred to the columns of U on the right to 
form V. Retaining the largest K singular values in an 
SVD is the same as retaining the largest K eigenvalues 
in absolute value. However, individual rank-1 terms of 
the form XiU.,iU^ in the eigenvalue expansion of A tell 
different stories when interpreted in terms of community 
structure depending on the sign of A^. 
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FIG. 3. The eigenvalues for Zachary’s karate network. Promi¬ 
nent positive eigenvalues 1 through 4 define the unipartite 
community structure. Prominent negative eigenvalue 34 de¬ 
fines a bipartite approximation of the network. 


If Xi > 0, the matrix has two blocks with positive 
entries on the main diagonal and two off-diagonal blocks 
with negative entries. This corresponds to a partition of 
the network into two unipartite-style communities, with 
the positive matrix elements quantifying affinity and the 
negative ones quantifying antagonism between the nodes. 

If Ai < 0, the blocks with positive entries are off- 
diagonal, which corresponds to a bipartite approxima¬ 
tion of the network, with two same-party communities 
appearing in the negative blocks and the connections be¬ 
tween the nodes in the positive ones. This is reminiscent 
of Newman’s observation Q that the eigenvector corre¬ 
sponding to the largest negative eigenvalue of the modu¬ 
larity matrix M can be used discern a (nearly-)bipartite 
structure. 

It is known that bipartite networks have symmetric 
positive and negative eigenvalues of the adjacency ma¬ 
trix. In addition, many unipartite networks have large 
negative eigenvalues, of magnitude comparable to the 
largest positive ones. This means that two mutually ex¬ 
clusive types of community description interfere if one 
simply performs a singular value decomposition of the 
adjacency matrix. The key to correctly revealing the 
community structure of a network based on the adjacency 
matrix is to spectrally split it into an “unipartite” and a 
“multipartite” component, the former constructed using 
exclusively the eigenvectors with positive eigenvalues and 
the latter the eigenvectors with negative eigenvalues. 



( 5 ) 

Afc >0 


Afc <0 

( 6 ) 


For the purpose of revealing the community structure, 
we can retain the largest Kp positive eigenvalues and the 
largest N — Kn + 1 negative eigenvalues. Assuming the 



FIG. 4. A modular unipartite network with 21 nodes. 


eigenvalues are listed in decreasing order, the “coarse¬ 
grained” versions of these matrices are 


Kp 


A{i_Kp} = X! 

( 7 ) 

k^l 


N 


A{Kr^-N} = ^ ^kU:kU'^. 
k^Kr, 

( 8 ) 


The results of such a spectral split for the network in 
Fig. [T] are shown in Figs. [2] (a) and (b). The first matrix 
reveals communities in “unipartite” mode: nodes from 
one party that are densely connected as second-order 
neighbors are lumped together with the first-order neigh¬ 
bors through which they are connected into cross-party 
communities. The negative entries of the second matrix 
reveal communities in “bipartite” mode, with nodes from 
only one party that share neighbors in the other lumped 
by themselves. The results for this network are discussed 
in more detail in subsection E. 

The interpretation of the eigenvectors of the adjacency 
matrix as “community modes” is best understood as gen¬ 
eralizing the definition of the eigenvector centrality: the 
eigenproblem Au = Xu is interpreted as a self-consistent 
way of quantifying the centrality of the nodes on a net¬ 
work such that the centrality ui of node i is propor¬ 
tional to the sum of the centralities of its neighbors. 
Since centrality measures are assumed to be 
non-negative, only the eigenvector corresponding to the 
largest eigenvalue is used to define the classical central¬ 
ity. On the other hand, if negative eigenvector elements 
are allowed, the negative signs can be transferred to the 
elements of A. We thus end up with two groups of nodes, 
all with positive centrality measures, but the centrality 
of one node is proportional to the sum of the centralities 
of the nodes from the same group that are connected to 
it minus the sum of the centralities of the nodes from the 
opposite group to which it is connected. This leads to 
meaningful bisections of the network. 
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FIG. 5. (Color online) Unipartite eigenvalue expansions of 
the adjacency matrix for the network in Fig. [4] Red (solid) 
and blue (hollow) dots represent positive and negative matrix 
entries, respectively. 


C. Application to bipartite and directed networks 

To better understand the way the spectral split method 
works, let us analyze in detail what it does to a bipar¬ 
tite network. The eigenproblem for a bipartite adjacency 
matrix 


where r < min(m, n) is the rank of B. 

The full (non-truncated) unipartite and multipartite 
components of A are then 


= (C'f VT) 

= m -VI) 

i=l ^ ''' 


or, in terms of i?, 


1 /VbW B 

2 V VWb 

__ 1 (-VbW b \ 

“ “ 2 B"^ -VWb) 


(16) 

(17) 

(18) 
(19) 


where '/M denotes the principal, positive semi-definite 
root of a positive semi-definite matrix M. 

The elements of matrices BB^ and B^B count the 
number of ways one can travel in two steps from a node 
in one party to another (or the same) node in the same 
party. The roots of these matrices act as substitutes for 
the absent intraparty connections, and their low-rank ap¬ 
proximations highlight the sets of nodes that are similarly 
connected in this way. Bipartite communities appear as 
negative entries in Am- 

The low-rank approximations of the unipartite com¬ 
ponent additionally highlight similar connections from 
either side to the other, and nodes from one party to¬ 
gether with those from the other party through which 
they are connected are placed in the same community. 

Note that, especially when the bipartite adjacency ma¬ 
trix is not written in the standard form of Eq. (9), the 
best way to reveal the bipartite communities is to use 


Au — Am 


Nbb'^ 

V 0 


0 ^ 

y/WBj ■ 


( 20 ) 



with B of dimensions m x n is equivalent with 

{BB'^)u = X^u (10) 

{B^B)v = X^v (11) 

and, if we perform a singular value decomposition 

B = (12) 

we find 

BB'^ = UYF-U'^ (13) 

B^B = VYAv'^. (14) 

The eigensystem of A (nullspace excluded) is thus of 
the form 


instead of Am- This prevents the off-diagonal blocks 
in Eq. (19) from interfering with the bipartite commu¬ 
nity detection process and also reveals these communities 
through positive entries, as can be seen in Eig. [5] (d). 

In the case of directed networks, the asymmetric adja¬ 
cency matrix plays the role of B Bipartite commu¬ 
nities are defined by similarity of only incoming or only 
outgoing links, whereas unipartite communities are de¬ 
fined based on similarity on either side and also contain 
the nodes to which the similar connections are made. 


D. A modularity-type matrix 

Discarding the first term of the unipartite component 
Au can be useful for revealing high-modularity unipartite 
community structures, which are also less likely to exhibit 
overlaps. This is because the matrix 
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A{.2.N}=A-XiU,iUl ( 21 ) 

has similar properties with the modularity matrix defined 
in Eq. (3). Since the components of U-,i are the eigen¬ 
vector centralities of the nodes, they are expected to be 
fairly correlated with the node degrees. Matrix A^ 2 -n} 
is, in fact, a modularity-type matrix with a different null 
model, which uses the eigenvector centralities instead of 
the degrees, and Ajj — XiU-iU^ is its unipartite compo¬ 
nent. The matrix depicted in Fig.[2](c) represents ^{ 2 - 3 } 
for the network in Fig. [TJ 

In light of the meaning of the first term in Eq. (5) as 
an outer product of the classical centrality eigenvector 
and best rank -1 approximation of the adjacency matrix, 
we see that provides more information about the 

importance of the nodes and links on the network as a 
whole, while A^ 2 -k} is more focused on distinct commu¬ 
nities, the importance of the nodes and links within them, 
and the possible antagonism between them. It should be 
noted, however, that keeping the first term does help with 
the detection of overlapping communities. 


E. Example network 

For the network in Fig. [U the truncated unipartite 
component of the adjacency matrix shown in 

Fig- El (a) reveals three communities, comprising nodes 
{1-4}, {5-8} and {7, 9, 10}. This is consistent with the 
visual analysis of the network, which suggests the over¬ 
lap between the latter two communities. Moreover, the 
importance of the “gateway” link between nodes 3 and 
5 as well as the central importance of node 7 are clearly 
indicated. Other smaller but significant entries indicate 
the stronger relationship between node 3 and nodes { 6 , 
8 } as well as between node 5 and nodes {2, 4}. Finally, 
the relatively close interaction between sets { 6 , 8 } and 
{9, 10} is also indicated. 

The modularity-type matrix ^{ 2 - 3 } is shown in Fig. [5] 
(c). In agreement with the discussion form the previ¬ 
ous subsection, this matrix shows non-overlapping com¬ 
munities {1-4}, {5, 6 , 8 } and {7, 9, 10}. These non¬ 
overlapping versions are not so well defined, presumably 
because of their competing tendencies to include node 
7. The antagonism between sets {3, 5} and {7, 9, 10}, 
which tend to split the set {5-8} in opposite directions, 
is also revealed. 

Figures H] (b) and (d) reveal “bipartite” communities 
{1, 3}, {2, 4}, {5, 7} and { 6 , 8 } defined based on sim¬ 
ilarity of connection. These figures show nodes 9 and 
10 each in a community by itself. This is an indication 
that the bipartite division of the network fails due to the 
link between them, with the algorithm providing an ex¬ 
act quadri-partite division instead: {1, 3, 6 , 8 }, {2, 4, 5, 
7}, {9}, and {10}, with the first two parties divided into 
two communities each. 


For sufficiently small networks, up to about 100 nodes, 
the community structure can be detected by visual in¬ 
spection of the truncated unipartite and multipartite 
components of A. For larger networks, two more ingredi¬ 
ents are needed in order to have an algorithm that can au¬ 
tomatically produce near-optimal community structures. 
The first is a rule for choosing the number of eigenvalues 
K. The second is an algorithm to assign the nodes to 
communities. 


F. Choosing the eigenvalue threshold 

The important structural features of a network are re¬ 
vealed by the most prominent positive or negative eigen¬ 
values of its adjacency matrix and their corresponding 
eigenvectors. The spectra of all modular graphs exam¬ 
ined exhibit (at least at the positive end, if no bipartite 
structure is discernible) a few prominent eigenvalues sep¬ 
arated by one or more large eigengaps from the rest. This 
is reminiscent of properties observed in the spectrum of 
the unsigned Laplacian matrix (^ . An example for a 
well-known network, which is discussed in detail in the 
Results section, is shown in Fig. [S] Numerical experi¬ 
ments show that the highest modularity partitions are 
obtained if exactly these eigenvalues are used to approx¬ 
imate Au or Am- 

However, it is important to emphasize that retaining 
more eigenvalues can be very useful, shedding additional 
light on the interactions between the nodes, despite the 
fact that if more eigenvalues are used to partition the 
network into communities the modularity will be lower. 
This ability to do a more in-depth analysis of the network 
structure is an advantage that the spectral split method 
offers over all community detection methods proposed 
thus far. Additional researcln using methods similar to 
those described in Refs. [331 - 1^ . will be required to quan¬ 
tify its resolution limit. 

A simple rule that can be used to automatically gen¬ 
erate high modularity community structures is to choose 
the threshold at the rightmost (or leftmost, in the case 
of the bipartite component) of the three most prominent 
eigengaps. More sophisticated algorithms can be devised 
to identify all significant eigengaps but, at least for net¬ 
works of size up to Af = 1000, such algorithms seem 
unnecessary. 

The fact that the eigenvalues separated by large eigen¬ 
gaps are sufficient to define the community structure is 
important from a computational point of view. It is 
known [T|, that the eigenvalues from both ends of the 
spectrum of a symmetric matrix and the corresponding 
eigenvectors can be computed by using the Lanczos al¬ 
gorithm [ 4 ^ much faster than the 0{N^) time required 
to compute the complete set of eigenvectors if these ex¬ 
tremal eigenvalues are separated from the rest by large 
eigengaps. 
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G. Assigning the nodes to communities 

The following algorithm gives good high-modularity 
non-overlapping partitions once a low-rank approxima¬ 
tion of Ajj is computed: 

1. Set the negative entries of A| 2 _ic} to zero. 

2. Perform a second eigenvalue decomposition of the 
resulting matrix, which has only a few large, pos¬ 
itive, eigenvalues with eigenvectors whose positive 
components are typically much larger than the neg¬ 
ative ones. 


TABLE I. Comparison of the modularity values obtained for 
a few well-known benchmark networks. 


Network 

N 

<d> 

lev 

ss(A) 

ss(M) 

Best 

Karate 

34 

4.59 

0.3934 

0.4174 

0.4174 

0.4197 

Dolphins 

62 

5.13 

0.4912 

0.5190 

0.5144 

0.5285 

Lesmis 

77 

6.60 

0.5323 

0.5526 

0.5469 

0.5600 

Football 

115 

10.7 

0.4926 

0.5889 

0.5817 

0.6046 

Jazz 

198 

27.7 

0.3936 

0.4328 

0.4402 

0.4450 

C. elegans 

453 

8.97 

0.3474 

0.3394 

0.3394 

0.4520 


III. RESULTS 


3. Assume that each eigenvector corresponding to a 
large eigenvalue represents a community and assign 
each node corresponding to a positive component 
to that community, with a strength of the tie equal 
to the value of the component. 

4. If non-overlapping communities are desired, as¬ 
sign each node to the community to which it is 
connected with the highest strength. For equal 
strengths, assign the node to the largest of the com¬ 
munities. 

It is important to point out that this is just one of 
many algorithms that could be devised to convert the 
information provided by the spectral split method into 
community assignments. It is quite possible that other, 
faster and better performing, algorithms will be found. 

As currently implemented by the author, with two 
eigenvalue decompositions and without the benefit of 
the Lanczos algorithm, the spectral split method can be 
characterized as “intermediately fast”. It is significantly 
faster than simulated annealing or extremal optimiza¬ 
tion, which were the two most accurate community de¬ 
tection methods known until now, but slower than the 
other, less accurate, methods mentioned in Introduction. 
However, the results presented in Section III show that 
spectral split vastly outperforms the faster methods and 
that it outperforms even extremal optimization in the 
case of large or highly modular networks. Moreover, us¬ 
ing the Lanczos algorithm is expected to result in signifi¬ 
cant time savings, as discussed in the previous subsection. 

For the purpose of comparison, the spectral split 
method combined with this algorithm was also applied 
to the classical modularity matrix M. Note that, in light 
of the discussion below Eq. (21), it is meaningless to talk 
about discarding the first term in the eigenvalue expan¬ 
sion of M, and therefore replaces A^ 2 -k} in this 

case. 

Finally, a refinement that leads to small increases in 
modularity on some networks is to cube the eigenval¬ 
ues and construct A| 2 _^| or instead of A^ 2 -k} 

or This refinement enhances the contrast be¬ 

tween communities defined by close eigenvalues and, even 
though the improvement is modest, has been used to gen¬ 
erate the results obtained in Figs. 151151 and ITUl 


We start by presenting results for the larger modular 
network in Fig. 31 which exhibits more features. 

Figure[S](a) shows the low-rank approximation A^i_ 4 y 
based on the four prominent eigenvalues separated by 
large eigengaps from the others. In the upper-left cor¬ 
ner there is a community consisting primarily of nodes 
{1-6}, but including nodes 7 and 8 as well. The central 
importance of nodes {1-3, 5} is clearly indicated, with 
node 5 highlighted as an important gateway node also 
connected to communities {9-11, 14} and {15-17, 19}. 
Node 8, though not an important member of this com¬ 
munity, appears as a gateway node towards community 
{18, 20, 21} to which it has stronger ties. Proceeding 
further down along the main diagonal, we find commu¬ 
nity {9-11, 14} with secondary nodes 12 and 13 attached 
to it and then the strong communities {15-17, 19} and 
{18, 20, 21}. The central importance of the pairs {14, 
15} and {17, 18} as gateway nodes is also highlighted by 
significant off-community entries. 

The full-rank unipartite component Ajj = A|]^_g| is 
shown in Fig. [5] (b). As expected, the additional terms 
included in Eq. (7) provide more detailed information 
about the importance of the nodes and of the links be¬ 
tween them. The importance of the nodes can be in¬ 
ferred from the diagonal elements of the matrix and the 
importance of the links from the off-diagonal elements. 
For example, within the first community, the importance 
of node 5 as a hub is emphasized in a way that distin¬ 
guishes it from nodes {1-3}. Its connections with nodes 
2, 4, 6, 9 and 17 are more clearly emphasized. The sec¬ 
ond community is resolved into two, {9, 11, 14} and {10, 
12, 13}, with the link between 9 and 10 highlighted as 
an important gateway. A more detailed analysis is left 
to the reader, but it is clear that looking at a high-rank 
approximation or at the full-rank unipartite matrix pro¬ 
vides a much richer picture of the network’s structure 
than a simple partition into communities. 

Finally, matrix A{ 2 - 8 } shown Fig. [S](c) highlights the 
antagonism between nodes {1-6} from the first commu¬ 
nity and community {15-17, 19}, as well as between the 
latter and community {9-11, 14}. 


Detailed results for two well-known benchmark net¬ 
works, the unipartite karate network of Zachary 44| and 
the bipartite Southern women network [4^, |4^ are pre- 









FIG. 6. (Color online) The adjacency matrix and three low- 
rank unipartite and bipartite components for Zachary’s karate 
network. Red (solid) and blue (hollow) dots represent posi¬ 
tive and negative matrix entries, respectively. The dot in the 
legend box has unity diameter. 



FIG. 7. (Color online) The adjacency matrix and three low- 
rank unipartite and bipartite components for the Sonthern 
women network. Red (solid) and bine (hollow) dots represent 
positive and negative matrix entries, respectively. The dot in 
the legend box has unity diameter. 


sented next. 



FIG. 8. (Color online) Ensemble averages of the mutual infor¬ 
mation versus the average modularity of the built-in partition 
for N = 300, < k >= 8, kmax = 16. Results are presented 
for the leading eigenvector algorithm (unrefined: continuous 
black line, with refining: dotted red line), extremal optimiza¬ 
tion with refining (dashed green line), spectral split of M 
(dash-dotted blue line), and spectral split of A (dash-dot- 
dotted brown line). 


A. Zachary’s karate network 

The adjacency matrix for the karate network is shown 
in Fig. [6] (a) and its eigenvalues in Fig. [3l The four pos¬ 
itive eigenvalues separated by large eigengaps from the 
others are the ones that define a high modularity com¬ 
munity structure. The non-overlapping partition with 
the maximum modularity for this network is {1-4, 8, 12- 
14, 18, 20, 22}, (5-7, 11, 17}, (9, 10, 15, 16, 19, 21, 23, 
27, 30, 31, 33, 34} and (24-26, 28, 29, 32}, for which 
the Newman modularity is Qmax = 0.419790. A quick 
inspection of Figs. [6] (b) or (c) reveals a slightly different 
result, with an overlap between the first two communities 
at node 1 and an overlap between the last two commu¬ 
nities at node 24. Both of these overlaps make sense 
in light of the way nodes 1 and 24 are connected. If 
the algorithm described in the previous section is used 
to generate a non-overlapping community structure, the 
maximum modularity partition described above is repro¬ 
duced with the exception of node 24 being assigned to the 
third community, which results in a very slight drop in 
modularity to Q = 0.417406. Note though that node 24 
is connected to only two nodes in the community where 
it is placed by maximizing modularity and to three nodes 
in the community where it is placed by the spectral split 
algorithm. 

Finally, Fig. [S] (d) shows a rank-1 approximation of 
the bipartite component of the adjacency matrix, namely 
the term corresponding to the most prominent negative 
eigenvalue. This splits the network with nodes (1, 2, 3, 
17, 25, 26, 33, 34} in one community and the rest of them 
in another, which is roughly the two opposite centers of 
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FIG. 9. (Color online) Ensemble averages of the mutual infor¬ 
mation versus the average modularity of the built-in partition 
for N = 300, < k >= 20, kmax = 40. Results are presented 
for the leading eigenvector algorithm (unrefined: continuous 
black line, with refining: dotted red line), extremal optimiza¬ 
tion with refining (dashed green line), spectral split of M 
(dash-dotted blue line), and spectral split of A (dash-dot- 
dotted brown line). 


power connected through the other nodes. 


B. The Southern women network 

This network is the most frequently used benchmark 
for bipartite community detection algorithms [ 1 , IH, . 
Nodes 1 through 18 represent women, while nodes 19 
through 32 represent events in which they participated. 
The original par tition into communities, given by the au¬ 
thors of Ref. |45l| , pertains only to women and is an over¬ 
lapping one: {1-9} and {9-18}. The adjacency matrix 
for this network is shown in Fig. [7] (a) while unipartite 
and bipartite components for K = 2 are shown in Figs. [7] 
(b-d).^ 

By inspection of Fig. [7] (b) we find overlapping uni¬ 
partite communities {1-10, 19-27} and {3, 7-18, 25-32} 
while Fig. [7] (d) reveals overlapping bipartite communi¬ 
ties {1-10}, {3,7-18}, {19-27} and {25-32}. A more care¬ 
ful consideration of the link weights shows that the only 
significant overlaps between the women communities oc¬ 
cur at nodes 8 and 9, which is in good agreement with the 
original partition. Note that in this simple case, where 
the network is rigorously bipartite and divided using very 
low-rank approximations of the adjacency matrix, the bi¬ 
partite communities can be expressed as intersections be¬ 
tween the unipartite communities and either party. This 
is not necessarily the case, however, if higher-rank ap¬ 
proximations of the adjacency matrix are used or if the 
network is only approximately bipartite. 

Matrix A{ 2 }, which is depicted in Fig. [7] (c), reveals 
two unipartite non-overlapping communities: {1-7, 19- 
24} and {8-18, 25-32}. This result is very close to the 


partition obtained in Refs. Si! for the case of division 
into two communities, namely {l-7, 9, 19-26} and { 8 , 
10-18, 27-32}. 

With regard to the bipartite communities, the high¬ 
est modularity division reported in Ref. is {1-6}, 
{7,9,10}, {8,16-18}, {11-15}, {19-24}, {25,26}, {27,29} 
and {28,30-32}. Similar partitions can be obtained with 
the spectral split algorithm if more eigenvalues are in¬ 
cluded. For example, using A{i_ 3 } — ^{ 30 - 32 } we find 
partitions {1-7, 9, 10}, { 8 , 16-18}, {11-15}, {19-25, 27}, 
{26} and {28-32}. 

Finally, Figs. [7] (b) and (d) also show the higher im¬ 
portance of nodes {25-27}, which represent events {7-9} 
and were attended by many women from both groups 
[ 2 ^ . The event communities are actually shown to 

be overlapped at these nodes. 


C. Other benchmark networks 

Table m shows a comparison of the modularities ob¬ 
tained using the spectral split method applied to the ad¬ 
jacency matrix and to the modularity matrix, denoted 
by ss(A) and ss{M), respectively, with those obtained 
using the unrefined leading eigenvector method Q, de¬ 
noted by lev, and with the highest modularity results 
found in literature [ 2 ^. Included are some of the best- 
known networks, namely Zachary’s karate network (^ . 
the dolphins network of Lusseau et ah, the network of in¬ 
teractions between the characters in Victor Hugo’s “Les 
Miserables” [d^, the American college football network 
first studied by Girvan and Newman [d^, the network 
of jazz musicians J^, and the metabolic network of the 
worm C. elegans [50l |. 

With the exception of the C. elegans metabolic net¬ 
work, both applications of the spectral split algorithm 
compare very well with the other methods, and ss(A) 
seems generally better than ss{M). Note that the high¬ 
est modularity results are typically obtained by simu¬ 
lated annealing or extremal optimization, which are much 
slower methods. The results in Table U suggest that 
the spectral split method works better for networks with 
higher average degree or higher modularity. They also 
seem to hint that the algorithm might not work well for 
larger networks. 


D. Statistical ensemble results 


To check the validity of these statements and to quan¬ 
tify the performance of the algorithm, tests were per¬ 
formed on ensembles of random benchmark networks 


generated using the algorithm from Ref. These are 
scale-free networks with a built-in community structure. 
They have a number of tunable parameters, which in¬ 
clude the average degree, the maximum degree, and the 
mixing parameter /x, which represents the average frac¬ 
tion of links running between different modules and con- 
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FIG. 10. (Color online) Ensemble averages of the mutual in¬ 
formation versus the average modularity of the built-in par¬ 
tition for N = 1000, < k >= 20, kmax = 40. Results are pre¬ 
sented for the leading eigenvector algorithm (unrefined: con¬ 
tinuous black line, with rehning: dotted red line), extremal 
optimization with refining (dashed green line), spectral split 
of M (dash-dotted blue line), and spectral split of A (dash- 
dot-dotted brown line). 

trols the average modularity of the statistical ensemble of 
networks. The parameters not discussed here were kept 
at their default values. 

Tests were performed on networks of size N between 
100 and 1000, average degree (d) between 6 and 30 and 
maximum degree up to 100. Some of the results are pre¬ 
sented in Figs. [51 O and (TUI The data points in these 
figures represent averages computed over ensembles of 
100 networks with fixed values of the mixing parameter 
/i. The average mutual information between the com¬ 
puted and the built-in partitions is plotted versus the 
average modularity of the built-in partition. The error 
bars represent the standard error of the mean. To obtain 
the different points, /r was varied between 0.1 and 0.6 in 
steps of 0.05. 

The spectral split method [both ss(A) and ss{M)] is 
compared with three other methods implemented using 
the Radatools software package [52|. These are the 
leading eigenvector method Q without refining (lev), 
the same method with multiple Kernighan-Lin-like and 
greedy optimization refining a a repeated 10 times 
(heuristics string srfr 10), and the extremal optimiza¬ 
tion method of Ref. Q followed by spectral optimization 
and refining (heuristics string esrfr 1). 

It is clear that increasing network size does not reduce 
the ability of the spectral split method to detect the cor¬ 
rect community structure. Quite to the contrary, it is in 
the case of large networks that it compares most favor¬ 
ably with its peers. Note that the N = 300 and N = 1000 


networks from the high-modularity ensembles routinely 
exhibit 10 to 20 communities. Spectral split is vastly su¬ 
perior to the unrefined leading eigenvector method, and 
it overtakes all the other methods, including extremal op¬ 
timization, in the case of networks with significant mod¬ 
ularity. 

On the other hand it is true that, without refinement, 
the spectral split algorithm falls behind extremal opti¬ 
mization in the case of low-modularity or very sparse 
networks. For networks that are not very sparse, the low 
values of modularity at which this happens are compa¬ 
rable to those of similar random networks, and therefore 
it is questionable whether such community structure is 
truly meaningful [l|. 

In regards to speed we note that, although slower than 
less accurate methods, spectral split is faster than ex¬ 
tremal optimization or simulated annealing while offer¬ 
ing comparable accuracy. For example, in the case of 
networks of size N = 1000 it is an order of magnitude 
faster than extremal optimization even without using the 
Lanczos algorithm to compute the eigenpairs. 

Finally, ss{M) appears superior to ss(^) on very 
sparse networks, but the difference in performance be¬ 
tween the two variants is negligible in all other cases 
and decreases with increasing network size. If we also 
consider the results obtained in the previous subsection, 
which show ss( 2 l) outperforming ss(M) on real-world 
networks, we conclude that the comparison between them 
is probably a complex issue that depends on many as¬ 
pects of network topology. 


IV. CONCLUSIONS 

A new method for analyzing the structure of complex 
networks was introduced. This method does more than 
simply partition the network into communities, provid¬ 
ing information, at different levels of detail, about the 
strengths of the interactions between the nodes. In this 
regard, it is useful even without an actual grouping of 
the nodes into communities. The spectral split method 
introduced in this paper can be applied to the adjacency 
matrix, in which case it can reveal both unipartite and 
bipartite community structures, but for unipartite net¬ 
works it can also be applied to the modularity matrix. 
An algorithm is also introduced for the purpose of con¬ 
structing the communities. Tests on statistical ensem¬ 
bles of benchmark networks show that the spectral split 
method combined with this algorithm produces excellent 
results, especially in the case of large networks or net¬ 
works with significant modularity. It is possible that fur¬ 
ther research will produce faster and better-performing 
community assignment algorithms which will make the 
spectral split method even more competitive. 
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